A database design for a TTS synthesis system using lexical diphones
نویسندگان
چکیده
Database designs, if based on the premise that there are about 2000 diphones in English, as stated in many publications and on-line documents, are likely to render a database of diphones, which will fail to capture some important phonological phenomena of English. This paper proposes a TTS database, which is built from diphones inclusive of their syllabic stress; we term these units lexical diphones. A comprehensive lexical diphone feature set is generated using a stress-annotated dictionary and continuous text and speech. A method based on multiple set cover algorithms, applied to wordlists of specialized English usage, and a knowledge-based phonological approach, are used to produce a core text corpus of 540 sentences. An objective evaluation of our database with other databases shows that our database (considering its size) has a higher concentration of lexical diphones; a subjective evaluation shows listeners’ preference for the speech where there are more lexical than phonemic units.
منابع مشابه
Increased Diphone Recognition for an Afrikaans TTS system
In this paper we discuss the implementation of an Afrikaans TTS system that is based on diphones. Using diphones makes the system flexible but presents other challenges. A previous effort to design an Afrikaans TTS system was done by SUN. They implemented a TTS system based on full words. A full word based TTS system produces more natural sounding speech than when the system is designed using o...
متن کاملDesign and evaluation of prosodically-sensitive concatenative units for a Korean TTS system
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean...
متن کاملطراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملExtraction of Di-phones for Telugu ::Issues and solutions
This paper describes a method for extraction of diphones to generate diphone database for concatenative text to speech systems. Diphone is an adjacent pair of phones. Diphone is a very important resource for both text to speech [TTS] and speech to text [STT]. Consider the pronunciation of -kaaki. It consists of phonemes [k], అ [a], అ [a], [k], ఇ[i]. The diphones generated while pronouncing the ...
متن کاملA Prosodic Diphone Database for Korean Text-to-Speech Synthesis System
This paper presents a prosodically conditioned diphone database to be used in a Korean text-to-speech (TTS) synthesis system. The diphones are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences (following the K-ToBI prosodic labeling conventions [3...
متن کامل